General Remarks
- Group work is NOT allowed in the lab. You have
to work alone. Discussions with colleagues (e.g., in the forum) are
allowed but the code has to be written alone.
- Be sure to check the Tricky Parts section for questions!
- Further reading is provided under tutorials
- In order to guarantee that all students can work fluently, do
not waste resources on the lab server (i.e., do not let your RMI
servers run unnecessarily, kill your running zombie processes,...)
Submission Guide
Submission
- You must upload your solution using the Teaching Tool before the submission deadline: 07.12.2007, 18:00 cet. - the deadline is hard!
- You are responsible for submitting your solution in time!
Please make sure that your upload was successful (i.e., you should be
able to download your solution in the teaching tool - as the tutors
will do during the interview). If you do not submit, you won't get any
points!
- You have to upload your solution as a ZIP file. Please
submit only the sources of your solution (not the compiled class files
and no third-party libraries).
- Before the submission deadline, you can upload your solution
as often as you like. Remember that only your last submission is taken
into consideration!
- Your submission must compile and run in our lab environment. Please test before you submit.
- Please provide ant-tasks to ease testing. You may use our ant template.
Interviews
- After the submission deadline, there will be a mandatory
interview (Abgabegespräch). You must register for a time slot to the
interviews using the Teaching Tool.
- You can do the interview only if you have submitted your solution before the deadline!
- The interviews start two weeks before the main "interview" - week. If you pass before 05.12.2007 you will earn 2 bonus points!
- The interview will take place in the DSLab. During the interview, you will be asked about the solution that you have uploaded (i.e., changes after the deadline will not be taken into account!). In the interview you need to explain your code, design and architecture in detail.
- Remember that you can do the interview only once!
Description
In this assignment you will learn:
- the basics of a simple distributed object technology (RMI)
- how to lookup objects with a naming service
- how to implement a callback with RMI
- how to implement a replication mechanism based on RMI
This year you shall use this features to implement a simple(!) distributed database system.
Overview
Replication is a formidable means to either increase the availability or the performance
of a system.
We use replication to store the data of the database in several files to increase its availability.
To access the replica's in a consistent and transparent manner you have to build a facade class that
accesses the replica's via the RMI protocol. This facade class shall offer a method for each
function of the database. This facade class is directly instantiated within your testing code and clients.
Database functionality
You shall realize simple database functionality that stores tables
and records (inside files).
The focus of this lab is NOT on the database functionality. Hence it
suffices if the functionality is implemented in a simple way.
The required functionality is listed below. In
addition to these descriptions we also provide similar SQL statements
(the function description resemble SQL commands - however, you need NOT support SQL syntax, you just have to provide appropriate interfaces that can be called by developers).
- Create Table creates a new table. A table
in our database stores various records. Multiple tables with different
names shall be supported. A table has multiple columns that store
character strings. Each column has a name and a length, ie the number
of characters that can be stored. You do not need to take care if
columns are used as key columns. You can assume that the columnname is
as small as the number of characters that can be stored. The order of
the columns is not important. The columnname will only need to take
alphanumeric characters. You need not consider storing newlines or
tabulators in the columns (as well as the column name). You need not
check if non-alphanumeric characters are provided as arguments. If you
want to get all points for this lab assignment you should consider
storing a version number (of the table data) as you need it in the
replication part.
- Insert Into Table inserts data into a table.
The user has to provide sufficient string arguments for all columns
(you can also provide an associative data structure such as Hashtables
that take combinations of columnnames and new-values).
- Select From Table selects data from a table. Your implementation of this function shall take one columnname-columnvalue
pair as a parameter to identify records within the table: all records
with an entry that equals the given value in the identified column
shall be returned. If a special column name is provided (eg.:*) then
all records shall be returned.
- Update Table updates a particular record. As the select function takes one columnname-columnvalue
pair as a parameter to identify the records that shall be updated. The
remaining arguments required by your update function identify only new
values of the identified records. The other columns' values shall not
be modified (of course these values can be rewritten with the previous
values).
- Delete From Table deletes all records that are identified by one columnname-columnvalue pair.
There are different possibilities how you store and access the database data:
- you can use Java serialization to store the table's data in
a single file. The advantage is that this is very easy to implement.
The disadvantage is that serialized data does not typically produce
human readable files. Hence, you shall consider to overwrite the
toString method to debug your implementation.
- you can use Java's RandomAccessFile (java.io.RandomAccessFile)
class to do the access. This produces files that have a fixed-length
record structure. This requires more implementation effort than the
serialization variant. However, the advantage is that the files may be
human-readable. The RandomAccessFile
class provides a method seek that allows to move a file pointer to a
certain position. This position can be calculated by considering the
column lengths and the record number. Typically the first line of the
record shall contain information about the structure (ie, columnnames
and column lenghts).
- principally other persistence mechanisms are possible, too
(eg. reading a file stream and writing the whole stream, ...). However,
you should stick to the two possibilities mentioned above.
- support to store the data of a particular database instance
in its own directory. You can parameterize this with a constructor
parameter. You will need this in the second part of the assignment.
You have to check only the following cases for wrong user input and throw an exception.
- invalid table name: only alphanumeric characters are
allowed for a table name, or an unknown table is accessed. Why do we
restrict this to alphanumeric characters? Regardless how you store the
table's data it will be stored in the file system. The easiest way to
do this is to use files that have the same name as the table. Table
names that include forward slashes or other control characters may be
problematic when creating or opening a file.
- invalid column name: for select, update, delete a criteria
has to be provided. If an invalid column for this criteria is provided
then throw an exception.
- invalid data: any case that new data is invalid (column data
is longer than the maximum length, invalid column names, data is not
provided for all columns in the insert, ...)
You shall implement your own Exception
class. However, we recommend that you use an unchecked exception, ie.
the exception class extends the RuntimeException
class. These exceptions are not checked by the Java compiler and you do
not need an appropriate throws clause.
The advantage of these unchecked exceptions is that the implementation
code can throw any exception without requiring the client code know
every potential exception.
We highly recommend that you build a test client to verify the functionality of your database.
Build a Remote Database
The general idea is to make your database remotely accessible via RMI (Remote Method Invocation).
- Design an remote interface, ie include appropriate methods to make the database functionality accessible by remote clients.
- Implement also an RMI (Remote Method Invocation) server to
implement the remote interface. You can either implement the database
functionality directly within the RMI server, or you can implement an
additional server class that accesses the database functionality inside
another class.
- Your server shall be started with a particular parameter that
identifies the server. Register this instance of your RMI server within
an RMI registry with the identification parameter.
You find all required information about
RMI in the Java Remote Method Invocation tutorial. The link is given
below. If you follow this tutorial you should be able to make your
database remoteable.
Some additional hints:
- When you test your implementation at the server you have to
bind your server to the global rmi registry with a uniquely defined
name. To not interfer with other students you should bind to the
registry with something like 'DBServerXXX-Y' whereas XXX is your lab id, y is a running number you can assign yourselves.
- Start one or multiple implementations and register them at the registry.
- Consider also to unbind from the registry when you stopping your process.
It is mandatory that you build a test
client that verifies all the functions that are available through your
remote interface. This means: build a JUnit test client that has a test method for each of the available remote interface methods.
Callback
Define an additional interface that acts as a notification callback,
ie. its single method gets called whenever a database change happens
(you don't need to consider creation of a table). In Java classes that
implement such notification callbacks are called listeners
- you can find more information about RMI Callbacks in the link below.
The single method shall be called with the table name, the cause of the
database change (delete,insert,update), and the modified data.
Enhance the remote interface for the database access
to add and remove such listeners (more than one listener may be
registered at the same time) for a particular table. Whenever a change
happens to the database call the notification method of ALL registered
listeners.
To let the listener work in a remote setting you have to:
- build an implementation class that implements your callback interface. Think about the necessary action to let the JUnit
test client verify that the appropriate notification has been called.
As a debugging aid you may output the notification data to the console.
- export an instance of this implementation class. You don't
need to register this instance at the RMI registry (and you should not
do this). Unexport the instance of the listener's implementation class
when it is no longer needed.
- add the listener instance to your remote database access via your enhanced remote interface.
Enhance the test client that shows that a
modification of a database accessible via RMI leads to an invocation of
an RMI-exported listener. Doing these tests locally without RMI is nice
but will not give you the points.
Replication
In this second part you have to build a replicated database server. Gifford's scheme shall be used to keep
the access to the replicas consistent. To support this version numbers shall be assigned to data files (exactly one version number for one table).
Gifford's scheme works as follows:
To read a file that has N replicas a so-called read quorum is assembled, i.e. a subset of NR servers
among these N replicas. These NR may be chosen arbitrarily.
To modify a file a write quorum of NW servers is built in the same way as the read quorum.
The two values NR and NW need to satisfy two constraints:
- NR + NW > N
- NW > N/2
The following figure gives an example of
Gifford's scheme. The first three servers have been selected as the
write quorum, the last two servers have been selected as a read quorum.
Only if the above constraints are satisfied then at
least one server of the read quorum will lead to recent data.
Similarly, if the constraints are satisfied then always one write
server will have recent data.
You shall build a client-side facade class that realizes this Quorum Based replication scheme.
This class is initialized with a set of N replicas (these might be
already looked-up RMI proxies, but these might just be the names of the
RMI services, the lookup for the RMI proxies has then to be done
later), and the number NR and NW. Verify that the
constraints described above are satisfied. In case of a contradition
react accordingly, ie throw an exception.
For each database access either a read quorum or a
write quorum (or both) are created: select randomly (i.e. with a random
number generator) the appropriate subset of servers accordingly. This
subset shall be selected at each read or write request.
For reading a replicated value all servers of the
read quorum need to be contacted. The data of the server with the
highest version number is chosen. When data gets updated/written all
servers in the write quorum are asked for the version number of the
data.
The data of all the servers in the write quorum are updated. The new
version number equals the version number of the server in the write
quorum with the highest version number plus 1.
Important Remark: there is only one version number for each table. If
you update data of a server with a smaller version from a server with a
higher version then you have to replicate ALL records from the server
with the higher version.
This constraint needs always be valid: identical version numbers
implies identical data.
Test Client
For testing (and demonstrating) that your distributed database works
you shall start multiple RMI servers that provide access to an instance
of your database. These database instances shall write to different
file system locations.
To test your implementation you shall write one (or multiple if you
like) JUnit test classes that include:
- a test method that creates a new table in each of your
distributed databases. Fill it with some test data (the same data in
each replicated database). You can hardcode the names of your
databases.
- a test method for each of the select,insert,update,delete
functionalities that connects to one (or all) of your remote database
instances and use the corresponding functionality. Use the select
functionality to verify if data has really changed.
- write a test method to demonstrate that changing the database
invokes the callback. You can easily show this by letting the listener
class store the change cause and the changed data.
To test the implementation of Gifford's scheme write following separate test class:
- connect to a fixed number of servers and create one
instance of the facade class that realizes the quorum based access with
a correct initialization of NR and NW (select appropriate numbers for NR and NW).
- write test cases for the replicated insert,update and delete methods. Show that the newly updated data is available in just NW
of all server instances by connecting to these server instances
directly (instead of the facade class). This can easily be done if you
have remote methods to retrieve the version number.
Hints Tricky Parts
Following hints will be useful when dealing with RMI:
- If you work at home you have to start your own rmiregistry (the command is called 'rmiregistry').
- Please conform to the suggested naming scheme above: DBServerXXX-Y, whereas XXX is your lab id, and Y is a number (or character) you can assign to your server.
- You have to start your server with setting the
java.rmi.server.codebase property to a classpath where your compiled
remote interfaces reside. Java Properties can be set on java startup
with java -Dpropertyname=value.
- Typically the codebase has to be set with an absolute path starting with the url prefix file:///classpath, eg. java.rmi.server.codebase=file:///home/users/user123/lab2/build/
- If the codebase refers to a directory there must be a final "/" character.
- If you are using Windows providing the codebase is a little
bit tricky because of the backslashes and spaces inside names (the same
is valid if you use spaces in Unix file names): first you have to
convert all backslashes to forward slashes. Second you have to replace
all space characters with %20. For instance: file:///C:/Documents%20and%20Settings/...
- RMI servers will normally block until the process is killed.
If you use ant to start your server(s) then you must include a fork
attribute in the java command. Otherwise the server process will be
killed when ant finishes processing the build.xml file.
- For using the Java 5 automatic proxy generation feature you must not use the static exportObject method (inside class UnicastRemoteObject) that returns java.rmi.server.RemoteStub. Instead you must use the exportObject that returns java.rmi.Remote.
If you use the wrong method then the automatic proxy generation
facility will NOT work. If you provide 0 for the port then a free port
is assigned to your server.
- you may also extend your server class directly from UnicastRemoteObject.
Then you don't need to export your object directly. However the general
disadvantage of this approach is that you waste the luxury of single
inheritance in Java to implementing RMI (via UnicastRemoteObject)
and can't have a super class that is specific to the domain of the
application (eg. Distributed Databases in our example). For the lab you
can choose any of the two options.
- there is only one version number for each table. If you
update data of a server with a smaller version from a server with a
higher version then you have to replicate ALL records (not just the one
being updated,inserted,deleted) from the server with the higher
version.
This constraint needs always be valid: identical version numbers implies identical data.
- if you do not use Java serialization deletion of records
can be easily implemented when you mark the record with a special
symbol (that means this record is deleted) because you do not have to
rearrange the whole file.
Ant Template
For Lab 2 you can extend this Attach: build.xml that supports also Junit.
The task definition entry in this file is appropriate for the combination of ant-1.6.x and JUnit as
installed on the lab servers pasta and pizza.
In addition you will need to include static method suite() in your JUnit test classes:
import org.junit.*;
public class TestClassX {
... test methods
public static junit.framework.Test suite() {return new junit.framework.JUnit4TestAdapter(TestClassX.class);}
}
(If you use ant-1.7 (at your home PCs) you do not need to include this method. However, if you want to start the tests in the lab you should include it).
Important remark: the build.xml file does not include targets to start instances of your servers.
Proposed Tutorials
Java Remote Invocation (RMI) Tutorial
RMI Callbacks
Proposed Literature
Gifford's scheme is described in Tanenbaum's Distributed Systems Books - Principles and Paradigms in
the Section about Replicated-Write Protocols.